Translation of Unknown Terms Via Web Mining for Information Retrieval
نویسندگان
چکیده
Many English words appear in Asian language texts, especially in the news reports and technical documents. Although a foreign term and its counterpart in English refer to the same concept, they are erroneously treated as independent index units in traditional monolingual IR. For CLIR, one of the major hindrances to achieving retrieval performance at the level of monolingual information retrieval is the translation of terms in queries, which are not found in a bilingual dictionary. This paper describes the degree to which these problems arise in Korean Information Retrieval and suggests a novel approach to solve it. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves the IR performance.
منابع مشابه
Exploiting the Web as the multilingual corpus for unknown query translation
Users’ cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. We propose a Web-based term transla...
متن کاملImproved Cross-language Information Retrieval via Disambiguation and Vocabulary Discovery
Cross-lingual information retrieval (CLIR) allows people to find documents irrespective of the language used in the query or document. This thesis is concerned with the development of techniques to improve the effectiveness of Chinese–English CLIR. In Chinese–English CLIR, the accuracy of dictionary-based query translation is limited by two major factors: translation ambiguity and the presence ...
متن کاملMining Bilingual Data from the Web with Adaptively Learnt Patterns
Mining bilingual data (including bilingual sentences and terms 1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrieval. In this paper, based on the observation that bilingual data in many web pages appear collectively following similar patterns, an adaptive pattern-based bilingual data mining method is proposed. Specifically, give...
متن کاملTowards Web Mining of Query Translations for Cross-Language Information Retrieval in Digital Libraries
This paper proposes an efficient client-server-based query translation approach to allowing more feasible implementation of cross-language information retrieval (CLIR) services in digital library (DL) systems. A centralized query translation server is constructed to process the translation requests of cross-lingual queries from connected DL systems. To extract translations not covered by standa...
متن کاملCross Lingual Information Retrieval Using Search Engine and Data Mining
-With the explosive growth of international users, distributed information and the number of linguistic resources, accessible throughout the World Wide Web, information retrieval has become crucial for users to find, retrieve and understand relevant information, in any language and form. CrossLanguage Information Retrieval (CLIR) is a subfield of Information Retrieval which provides a query in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006